1
00:00:15,259 --> 00:00:13,310
the subject of my talk was changed my

2
00:00:16,429 --> 00:00:15,269
mark just now because I actually wasn't

3
00:00:19,159 --> 00:00:16,439
going to talk about computational

4
00:00:21,769 --> 00:00:19,169
chemistry but actually it's a good way

5
00:00:23,390 --> 00:00:21,779
to introduce I was here at LC for a

6
00:00:25,760 --> 00:00:23,400
couple years as a research scientist and

7
00:00:27,169 --> 00:00:25,770
now I'm at a private company in Tokyo

8
00:00:29,359 --> 00:00:27,179
working on machine learning and

9
00:00:31,490 --> 00:00:29,369
artificial intelligence and one of the

10
00:00:34,369 --> 00:00:31,500
things I found to my surprise I still

11
00:00:36,410 --> 00:00:34,379
come to LC twice a week and all of these

12
00:00:38,450 --> 00:00:36,420
projects have opened up because of the

13
00:00:40,820 --> 00:00:38,460
stuff that I'm doing now in AI it's

14
00:00:43,750 --> 00:00:40,830
actually quite applicable to analyzing

15
00:00:45,889 --> 00:00:43,760
the kind of data that for example

16
00:00:50,360 --> 00:00:45,899
Elizabeth was talking about in her talk

17
00:00:52,040 --> 00:00:50,370
so we have a few limited observations of

18
00:00:54,380 --> 00:00:52,050
exoplanets and we want to know more

19
00:00:56,750 --> 00:00:54,390
about them and it turns out some of the

20
00:00:58,010 --> 00:00:56,760
most the recently developed techniques

21
00:01:01,540 --> 00:00:58,020
and artificial intelligence can actually

22
00:01:05,180 --> 00:01:01,550
help answer that it's everywhere now um

23
00:01:07,130 --> 00:01:05,190
so we had this sort of dead period in

24
00:01:09,529 --> 00:01:07,140
artificial intelligence where you didn't

25
00:01:12,020 --> 00:01:09,539
really hear much about it anymore

26
00:01:14,419 --> 00:01:12,030
and now in the last three years or so

27
00:01:17,690 --> 00:01:14,429
it's been completely expanding it's

28
00:01:19,640 --> 00:01:17,700
everywhere the news covers it companies

29
00:01:23,030 --> 00:01:19,650
are now using it for business purposes

30
00:01:24,980 --> 00:01:23,040
it's used in industry it's it has a lot

31
00:01:26,749 --> 00:01:24,990
of implications for things like privacy

32
00:01:28,010 --> 00:01:26,759
and society because governments apply it

33
00:01:31,130 --> 00:01:28,020
as well

34
00:01:33,260 --> 00:01:31,140
but the kind of artificial intelligence

35
00:01:35,660 --> 00:01:33,270
that people are working on now that

36
00:01:37,490 --> 00:01:35,670
people are developing now this sort of

37
00:01:40,010 --> 00:01:37,500
new artificial intelligence is very

38
00:01:42,679 --> 00:01:40,020
different than the sort of old style

39
00:01:44,270 --> 00:01:42,689
which is more like what you probably

40
00:01:49,850 --> 00:01:44,280
would have read about in science fiction

41
00:01:52,429 --> 00:01:49,860
novels so the mental image of machine

42
00:01:55,190 --> 00:01:52,439
intelligence had the sort of idea that

43
00:01:58,340 --> 00:01:55,200
it's logical and that it works by a

44
00:02:00,020 --> 00:01:58,350
deduction machines can't understand

45
00:02:02,780 --> 00:02:00,030
emotion but they can't understand how

46
00:02:06,940 --> 00:02:02,790
truth kind of applies and they can make

47
00:02:09,979 --> 00:02:06,950
very very wide-ranging sort of plans and

48
00:02:13,130 --> 00:02:09,989
actually ironically people tried to

49
00:02:15,319 --> 00:02:13,140
build that in the 60s and it didn't

50
00:02:18,259 --> 00:02:15,329
really work and we're kind of in a local

51
00:02:21,110 --> 00:02:18,269
way but it was very fragile so when you

52
00:02:22,970 --> 00:02:21,120
try to have it deal with raw data with

53
00:02:24,300 --> 00:02:22,980
images from a camera rather than

54
00:02:26,580 --> 00:02:24,310
something constructed in our

55
00:02:31,020 --> 00:02:26,590
put into shape by a human practitioner

56
00:02:32,730 --> 00:02:31,030
it just would suddenly break as a result

57
00:02:37,050 --> 00:02:32,740
of this people stopped funding AI

58
00:02:38,880 --> 00:02:37,060
research by and large in the 70s and at

59
00:02:40,710 --> 00:02:38,890
the same time there were techniques

60
00:02:43,830 --> 00:02:40,720
still being used in industry and

61
00:02:46,860 --> 00:02:43,840
business that did what we would just

62
00:02:49,590 --> 00:02:46,870
call statistics so they just analyzed

63
00:02:51,420 --> 00:02:49,600
data they tried to extract correlations

64
00:02:54,030 --> 00:02:51,430
from data and got more and more

65
00:02:56,610 --> 00:02:54,040
sophisticated with that at a recent

66
00:02:59,250 --> 00:02:56,620
point around 2000 those techniques

67
00:03:00,690 --> 00:02:59,260
really came into their own and with the

68
00:03:02,400 --> 00:03:00,700
increase in computing power some of the

69
00:03:05,340 --> 00:03:02,410
old techniques that were seen as more of

70
00:03:07,380 --> 00:03:05,350
an actual dedicated model of the brain

71
00:03:09,600 --> 00:03:07,390
or artificial intelligence kind of thing

72
00:03:10,770 --> 00:03:09,610
ended up almost being merged with these

73
00:03:11,910 --> 00:03:10,780
business techniques and these

74
00:03:13,740 --> 00:03:11,920
statistical techniques that had been

75
00:03:15,990 --> 00:03:13,750
steadily growing throughout this sort of

76
00:03:17,850 --> 00:03:16,000
dead zone of AI in the 70s and so now we

77
00:03:19,890 --> 00:03:17,860
have something that's very different

78
00:03:24,300 --> 00:03:19,900
than the science-fiction image of a kind

79
00:03:26,910 --> 00:03:24,310
of a logical deductive robot so in the

80
00:03:29,130 --> 00:03:26,920
old style of AI the idea is to

81
00:03:30,650 --> 00:03:29,140
explicitly represent human knowledge and

82
00:03:33,810 --> 00:03:30,660
then look at how human knowledge

83
00:03:35,190 --> 00:03:33,820
interacts so you have a few statements

84
00:03:36,750 --> 00:03:35,200
that you know are true and you have a

85
00:03:38,460 --> 00:03:36,760
few rules of how statements can be

86
00:03:40,259 --> 00:03:38,470
combined with each other and through

87
00:03:41,699 --> 00:03:40,269
that you generate everything that's true

88
00:03:44,220 --> 00:03:41,709
or everything that you could possibly

89
00:03:46,560 --> 00:03:44,230
know is true and that was sort of the

90
00:03:50,009 --> 00:03:46,570
mental image of what intelligence was

91
00:03:52,650 --> 00:03:50,019
like and the problem is this requires a

92
00:03:54,120 --> 00:03:52,660
lot of work from humans to set the thing

93
00:03:56,250 --> 00:03:54,130
up in a way that those rules are going

94
00:03:58,289 --> 00:03:56,260
to let the system go anywhere other than

95
00:04:00,479 --> 00:03:58,299
what was imagined so even though this

96
00:04:03,420 --> 00:04:00,489
could in principle extrapolate very far

97
00:04:05,130 --> 00:04:03,430
in practice it was brittle if you change

98
00:04:06,599 --> 00:04:05,140
things a little bit then suddenly these

99
00:04:09,240 --> 00:04:06,609
rules don't apply it doesn't really help

100
00:04:12,750 --> 00:04:09,250
you anymore the new approach basically

101
00:04:15,210 --> 00:04:12,760
is very hands-off human practitioners of

102
00:04:17,430 --> 00:04:15,220
it tried not to put anything in if at

103
00:04:21,659 --> 00:04:17,440
all possible let the Machine figure out

104
00:04:23,760 --> 00:04:21,669
what it can from raw data and the kind

105
00:04:27,480 --> 00:04:23,770
of message of this is the data itself

106
00:04:29,130 --> 00:04:27,490
contains almost a way to figure out how

107
00:04:30,750 --> 00:04:29,140
to process it if you just have enough

108
00:04:32,279 --> 00:04:30,760
data and you use robust enough

109
00:04:34,050 --> 00:04:32,289
techniques

110
00:04:37,500 --> 00:04:34,060
the result is you have a machine that

111
00:04:38,040 --> 00:04:37,510
almost acts like human intuition it can

112
00:04:40,080 --> 00:04:38,050
make

113
00:04:41,850 --> 00:04:40,090
guesses the guesses are generally

114
00:04:45,300 --> 00:04:41,860
correct or they're more correct than

115
00:04:47,939 --> 00:04:45,310
chance but at the same time it's really

116
00:04:50,460 --> 00:04:47,949
hard for the machine to explain why is

117
00:04:52,110 --> 00:04:50,470
this right why do I think this and that

118
00:04:54,360 --> 00:04:52,120
makes it very challenging for humans to

119
00:04:57,480 --> 00:04:54,370
interact with and to understand when

120
00:04:59,520 --> 00:04:57,490
things are going wrong so I want to kind

121
00:05:02,939 --> 00:04:59,530
of explain what this is

122
00:05:04,710 --> 00:05:02,949
concretely one of the the most common

123
00:05:06,270 --> 00:05:04,720
techniques now is using neural networks

124
00:05:07,680 --> 00:05:06,280
and these are on all sorts of different

125
00:05:10,920 --> 00:05:07,690
forms so this is just a sort of a

126
00:05:13,230 --> 00:05:10,930
prototype a toy toy example and there's

127
00:05:15,290 --> 00:05:13,240
you have some input to the network in

128
00:05:19,350 --> 00:05:15,300
this case an image of my cat

129
00:05:24,089 --> 00:05:19,360
that input is encoded in numbers so

130
00:05:25,830 --> 00:05:24,099
pixel values things like that and those

131
00:05:29,520 --> 00:05:25,840
numbers are then pushed through a

132
00:05:32,100 --> 00:05:29,530
network of successive computations so I

133
00:05:34,529 --> 00:05:32,110
start with these inputs in the next

134
00:05:36,839 --> 00:05:34,539
layer they become some new numbers in

135
00:05:39,270 --> 00:05:36,849
the next layer they become some new

136
00:05:41,610 --> 00:05:39,280
numbers and so on each of these layers

137
00:05:43,439 --> 00:05:41,620
is related to each other by a

138
00:05:46,499 --> 00:05:43,449
mathematical operation which I've

139
00:05:48,360 --> 00:05:46,509
defined and I put some kind of unknown

140
00:05:52,050 --> 00:05:48,370
values into that operation in this case

141
00:05:53,640 --> 00:05:52,060
these are these these weights so the way

142
00:05:56,730 --> 00:05:53,650
that the red layer goes to the green

143
00:06:00,600 --> 00:05:56,740
layer is specified by these weights and

144
00:06:02,850 --> 00:06:00,610
those weights will change in order to

145
00:06:04,860 --> 00:06:02,860
make the predictions better so right now

146
00:06:09,450 --> 00:06:04,870
this network is very bad it thinks this

147
00:06:12,360 --> 00:06:09,460
is a dog 70% chance but since I know how

148
00:06:14,640 --> 00:06:12,370
I got those numbers I can change the

149
00:06:16,740 --> 00:06:14,650
network to make it so that when shown

150
00:06:19,290 --> 00:06:16,750
the same thing now it's going to say cat

151
00:06:20,909 --> 00:06:19,300
and I can figure out by going back

152
00:06:22,950 --> 00:06:20,919
through the network by propagating those

153
00:06:24,990 --> 00:06:22,960
errors backwards through the same

154
00:06:26,399 --> 00:06:25,000
calculation I've just done I can figure

155
00:06:28,320 --> 00:06:26,409
out how each of those weights should

156
00:06:30,420 --> 00:06:28,330
change just a little bit to make it a

157
00:06:32,909 --> 00:06:30,430
little bit less mistaken on this one

158
00:06:34,800 --> 00:06:32,919
case and then in the future this is

159
00:06:38,369 --> 00:06:34,810
exaggerated but the next time I run it

160
00:06:40,260 --> 00:06:38,379
now it thinks probably a cat in practice

161
00:06:43,379 --> 00:06:40,270
if I do this on one image and if I just

162
00:06:46,230 --> 00:06:43,389
do this once the next cat may be a black

163
00:06:48,180 --> 00:06:46,240
cat it didn't help but if I do this on a

164
00:06:50,639 --> 00:06:48,190
hundred million cats and I do it over

165
00:06:51,820 --> 00:06:50,649
and over and over I gradually remove all

166
00:06:54,220 --> 00:06:51,830
of the ways that the network

167
00:06:56,650 --> 00:06:54,230
makes mistakes at least as covered by

168
00:06:58,480 --> 00:06:56,660
the data that I've given it but at the

169
00:07:03,220 --> 00:06:58,490
end of this it can't tell me why it

170
00:07:04,750 --> 00:07:03,230
thinks this is a cat so another property

171
00:07:08,920 --> 00:07:04,760
of these is that they're very good at

172
00:07:10,960 --> 00:07:08,930
filling in gaps in between what it's

173
00:07:13,120 --> 00:07:10,970
shown so if I show it a black cat and a

174
00:07:14,980 --> 00:07:13,130
gray cat and an orange cat and so on it

175
00:07:16,900 --> 00:07:14,990
can kind of figure out what's between

176
00:07:19,300 --> 00:07:16,910
them but if I show it a very different

177
00:07:21,510 --> 00:07:19,310
kind of cat instead of showing a you

178
00:07:24,940 --> 00:07:21,520
know a house cat I show it a cheetah

179
00:07:26,410 --> 00:07:24,950
then it's hopeless it's never seen

180
00:07:28,690 --> 00:07:26,420
anything like it it doesn't know how to

181
00:07:31,240 --> 00:07:28,700
fix the errors that it has built into it

182
00:07:34,060 --> 00:07:31,250
about how to process a cheetah so it's

183
00:07:35,980 --> 00:07:34,070
very good at this sort of inside of what

184
00:07:38,170 --> 00:07:35,990
it sees but outside of what it sees like

185
00:07:42,430 --> 00:07:38,180
here I've taken one of these things and

186
00:07:44,500 --> 00:07:42,440
I've just trained it on the blue data in

187
00:07:46,510 --> 00:07:44,510
between that's fine outside it doesn't

188
00:07:48,130 --> 00:07:46,520
extrapolate at all it doesn't understand

189
00:07:51,400 --> 00:07:48,140
this pattern should be repeated over and

190
00:07:53,550 --> 00:07:51,410
over and over which means that in the

191
00:07:55,930 --> 00:07:53,560
end of the day this kind of AI is very

192
00:07:58,150 --> 00:07:55,940
strictly limited by the quality of the

193
00:08:00,490 --> 00:07:58,160
data that it can be provided it can't

194
00:08:02,590 --> 00:08:00,500
become infinitely good just sitting in a

195
00:08:04,690 --> 00:08:02,600
room it has to constantly be receiving

196
00:08:06,670 --> 00:08:04,700
some kind of feedback and information

197
00:08:10,810 --> 00:08:06,680
from its environment and that controls

198
00:08:12,610 --> 00:08:10,820
what happens google recently did a study

199
00:08:15,580 --> 00:08:12,620
where they took all of the images from

200
00:08:17,770 --> 00:08:15,590
YouTube and all these photo sharing

201
00:08:19,960 --> 00:08:17,780
sites and things like that made a data

202
00:08:21,970 --> 00:08:19,970
set with 300 million images and that

203
00:08:23,920 --> 00:08:21,980
still performed noticeably better than

204
00:08:25,540 --> 00:08:23,930
the best AI people could generate with a

205
00:08:27,850 --> 00:08:25,550
hundred million images with ten million

206
00:08:30,070 --> 00:08:27,860
images and there's just a simple

207
00:08:32,620 --> 00:08:30,080
logarithmic scaling of performance with

208
00:08:34,540 --> 00:08:32,630
the amount of data so these things are

209
00:08:36,490 --> 00:08:34,550
very data bald necked

210
00:08:39,130 --> 00:08:36,500
that determines how good the thing you

211
00:08:41,980 --> 00:08:39,140
get is going to be all right what's

212
00:08:43,180 --> 00:08:41,990
going on inside of the network they're

213
00:08:45,640 --> 00:08:43,190
hard to interpret but they're not

214
00:08:47,620 --> 00:08:45,650
impossible the nice thing about having

215
00:08:49,450 --> 00:08:47,630
something on a computer is we can cut it

216
00:08:50,890 --> 00:08:49,460
open and see what's going on exactly and

217
00:08:54,220 --> 00:08:50,900
we can change something a little bit and

218
00:08:57,040 --> 00:08:54,230
see how that changes and the sort of

219
00:08:59,350 --> 00:08:57,050
image that has emerged from studying

220
00:09:00,730 --> 00:08:59,360
these things empirically is that what

221
00:09:03,220 --> 00:09:00,740
they really do is they filter out

222
00:09:05,660 --> 00:09:03,230
irrelevant information so in that first

223
00:09:08,180 --> 00:09:05,670
layer they keep

224
00:09:08,900 --> 00:09:08,190
as much as they can but then the next

225
00:09:10,700 --> 00:09:08,910
layer down

226
00:09:13,010 --> 00:09:10,710
they've discarded a few things that

227
00:09:14,180 --> 00:09:13,020
ended up not being very constructive to

228
00:09:15,590 --> 00:09:14,190
the question that was being asked and

229
00:09:18,020 --> 00:09:15,600
the next layer down they discard a bit

230
00:09:20,270 --> 00:09:18,030
more and a bit more and at the end the

231
00:09:22,400 --> 00:09:20,280
only information that exists in the

232
00:09:24,050 --> 00:09:22,410
network at the very last layer is what

233
00:09:26,480 --> 00:09:24,060
the network needs to answer the

234
00:09:28,400 --> 00:09:26,490
questions it's been asked but

235
00:09:30,500 --> 00:09:28,410
interestingly at the top at the very

236
00:09:34,160 --> 00:09:30,510
first layer it throws out things that

237
00:09:36,590 --> 00:09:34,170
are just generally not useful so in this

238
00:09:38,540 --> 00:09:36,600
case this is a network that's trained to

239
00:09:41,750 --> 00:09:38,550
identify the age and gender of a face

240
00:09:44,390 --> 00:09:41,760
and at the top layer you can see it can

241
00:09:46,430 --> 00:09:44,400
still see the rims of my glasses it can

242
00:09:48,680 --> 00:09:46,440
see my mouth all of these different

243
00:09:50,300 --> 00:09:48,690
things are the activations of one neuron

244
00:09:52,430 --> 00:09:50,310
in the network and I've just picked

245
00:09:54,950 --> 00:09:52,440
seven or seven neurons for that layer to

246
00:09:56,630 --> 00:09:54,960
visualize but there's 128 of them I

247
00:09:58,490 --> 00:09:56,640
think the way that it's actually

248
00:10:00,470 --> 00:09:58,500
detecting is if I actually look at it it

249
00:10:03,230 --> 00:10:00,480
finds very general features of images

250
00:10:05,570 --> 00:10:03,240
that are robust like edges circles

251
00:10:08,480 --> 00:10:05,580
corners things like that but it's not

252
00:10:11,390 --> 00:10:08,490
very responsive to say a noise pattern

253
00:10:14,150 --> 00:10:11,400
and that's because even at the most

254
00:10:15,890 --> 00:10:14,160
simple level ignoring noise is just a

255
00:10:17,180 --> 00:10:15,900
generally good strategy to being able to

256
00:10:18,860 --> 00:10:17,190
answer the question deeper in the

257
00:10:20,900 --> 00:10:18,870
network but as I go further and further

258
00:10:22,160 --> 00:10:20,910
down I stop being able to actually

259
00:10:24,620 --> 00:10:22,170
reconstruct some of these features

260
00:10:26,870 --> 00:10:24,630
anymore I have almost lost my glasses

261
00:10:28,700 --> 00:10:26,880
here here they're pretty much gone here

262
00:10:31,250 --> 00:10:28,710
I can't even see my face anymore there's

263
00:10:32,870 --> 00:10:31,260
just some statistical correlations that

264
00:10:34,370 --> 00:10:32,880
the network's held on to that are

265
00:10:37,040 --> 00:10:34,380
informative about age and gender

266
00:10:39,440 --> 00:10:37,050
so networks are generalized that is they

267
00:10:41,540 --> 00:10:39,450
they can work on things that they

268
00:10:43,280 --> 00:10:41,550
haven't seen because they're throwing

269
00:10:44,780 --> 00:10:43,290
away all of the distracting factors that

270
00:10:46,430 --> 00:10:44,790
actually make those things different by

271
00:10:48,230 --> 00:10:46,440
the end of the network all of these

272
00:10:49,430 --> 00:10:48,240
faces are basically the same face as

273
00:10:53,510 --> 00:10:49,440
long as they have the same age and

274
00:10:55,790 --> 00:10:53,520
gender so you guys what do these things

275
00:10:57,650 --> 00:10:55,800
actually know if a network can classify

276
00:10:59,870 --> 00:10:57,660
something does that mean it understands

277
00:11:03,710 --> 00:10:59,880
what it is there's the technique for

278
00:11:06,020 --> 00:11:03,720
actually asking the network or asking it

279
00:11:07,730 --> 00:11:06,030
to reconstruct an input that would

280
00:11:10,970 --> 00:11:07,740
convince it that this is this kind of

281
00:11:13,400 --> 00:11:10,980
class so in this case for example this

282
00:11:15,380 --> 00:11:13,410
is the network that's freely available

283
00:11:16,970 --> 00:11:15,390
people trained to have a very large data

284
00:11:19,069 --> 00:11:16,980
set called imagenet

285
00:11:21,379 --> 00:11:19,079
and it has a thousand different objects

286
00:11:23,389 --> 00:11:21,389
that it knows how to recognize I picked

287
00:11:25,310 --> 00:11:23,399
three that I thought would be very

288
00:11:28,610 --> 00:11:25,320
visually distinctive and I generated

289
00:11:30,259 --> 00:11:28,620
these three reconstructions this is what

290
00:11:33,079 --> 00:11:30,269
it thinks an image that would convince

291
00:11:35,689 --> 00:11:33,089
it it's a Pekingese dog and you can see

292
00:11:39,519 --> 00:11:35,699
there's some eyes and maybe some of the

293
00:11:44,509 --> 00:11:42,620
this is a flamingo and you can see the

294
00:11:46,129 --> 00:11:44,519
kind of the bird shape and some wings

295
00:11:48,500 --> 00:11:46,139
and the color patterns of a flamingo but

296
00:11:49,970 --> 00:11:48,510
again it's not like a flamingo situated

297
00:11:52,879 --> 00:11:49,980
in a realistic background it's not a

298
00:11:54,769 --> 00:11:52,889
full image it's just bits of the idea of

299
00:11:58,370 --> 00:11:54,779
a flamingo that are relevant to it that

300
00:12:00,350 --> 00:11:58,380
it thinks are indicative and the same

301
00:12:02,090 --> 00:12:00,360
for a snake where you can see scales you

302
00:12:04,720 --> 00:12:02,100
can see the kind of looks like the belly

303
00:12:07,819 --> 00:12:04,730
scales and an eye and maybe the face

304
00:12:09,740 --> 00:12:07,829
alright so in the in the case of that

305
00:12:12,019 --> 00:12:09,750
reconstruction that's what the network

306
00:12:15,319 --> 00:12:12,029
knows but it doesn't know that it knows

307
00:12:17,090 --> 00:12:15,329
the network has no sort of introspective

308
00:12:20,210 --> 00:12:17,100
process where it queries itself it just

309
00:12:21,769 --> 00:12:20,220
does things it's like when you when you

310
00:12:23,000 --> 00:12:21,779
catch a ball you don't have time to

311
00:12:24,379 --> 00:12:23,010
think about what are you going to do you

312
00:12:26,420 --> 00:12:24,389
just do it and then you can look at it

313
00:12:28,460 --> 00:12:26,430
after the fact and say this is what it

314
00:12:29,900 --> 00:12:28,470
felt like to catch a ball the network I

315
00:12:32,750 --> 00:12:29,910
described didn't have that mechanism

316
00:12:34,540 --> 00:12:32,760
it's just a it's just like your visual

317
00:12:37,639 --> 00:12:34,550
system or something very reflexive

318
00:12:39,319 --> 00:12:37,649
however we can intentionally design

319
00:12:40,939 --> 00:12:39,329
networks that have this kind of

320
00:12:43,879 --> 00:12:40,949
reflective mechanism or this kind of

321
00:12:47,150 --> 00:12:43,889
introspective mechanism one of the

322
00:12:49,280 --> 00:12:47,160
things that lets us do this we have a

323
00:12:50,689 --> 00:12:49,290
large amount of data available to us in

324
00:12:52,610 --> 00:12:50,699
our senses all the time but we can only

325
00:12:55,519 --> 00:12:52,620
pay attention to a little bit and we

326
00:12:58,100 --> 00:12:55,529
direct that attention to pay attention

327
00:12:59,960 --> 00:12:58,110
to what's to look at or to only take the

328
00:13:03,590 --> 00:12:59,970
data that's relevant to what we're

329
00:13:05,480 --> 00:13:03,600
trying to do the network that's static

330
00:13:08,150 --> 00:13:05,490
that I described before all of that

331
00:13:09,860 --> 00:13:08,160
stuff is frozen in it says I'm going to

332
00:13:12,470 --> 00:13:09,870
discard this information at this layer

333
00:13:14,840 --> 00:13:12,480
that's how I am but you can make it

334
00:13:16,819 --> 00:13:14,850
dynamic instead you can ask the network

335
00:13:19,430 --> 00:13:16,829
to direct what it discards based on the

336
00:13:22,220 --> 00:13:19,440
current circumstances and this kind of

337
00:13:24,880 --> 00:13:22,230
attentional model is actually really

338
00:13:28,579 --> 00:13:24,890
good for processing human language so

339
00:13:30,269 --> 00:13:28,589
the the newest Google state-of-the-art

340
00:13:32,369 --> 00:13:30,279
machine translation uses

341
00:13:34,259 --> 00:13:32,379
attention to model the relationship

342
00:13:36,449 --> 00:13:34,269
between words in the sentence and you

343
00:13:39,179 --> 00:13:36,459
can train it on about a 10 thousand

344
00:13:40,590 --> 00:13:39,189
times less in computer cycles than if

345
00:13:43,050 --> 00:13:40,600
you do one that doesn't have this kind

346
00:13:45,090 --> 00:13:43,060
of attention mechanism so this makes a

347
00:13:46,739 --> 00:13:45,100
really big difference to the performance

348
00:13:49,980 --> 00:13:46,749
of this kind of of this kind of

349
00:13:52,110 --> 00:13:49,990
technology all of what I've been talking

350
00:13:53,850 --> 00:13:52,120
about is networks that essentially get

351
00:13:56,100 --> 00:13:53,860
rid of information they discard what

352
00:13:58,139 --> 00:13:56,110
they don't care about you might ask well

353
00:14:00,389 --> 00:13:58,149
if I want to make something that creates

354
00:14:02,819 --> 00:14:00,399
new information if I want a creative AI

355
00:14:04,439 --> 00:14:02,829
could I do that and there's a technique

356
00:14:07,499 --> 00:14:04,449
that came out a few years ago called

357
00:14:09,059 --> 00:14:07,509
generative adversarial networks the

358
00:14:10,619 --> 00:14:09,069
technique here is that you have two

359
00:14:12,840 --> 00:14:10,629
networks that interact with each other

360
00:14:15,269 --> 00:14:12,850
and they essentially fight to try to

361
00:14:17,369 --> 00:14:15,279
fool each other one network produces a

362
00:14:18,840 --> 00:14:17,379
fake image and the other Network tries

363
00:14:21,689 --> 00:14:18,850
to learn how to tell the difference

364
00:14:24,929 --> 00:14:21,699
between reality and fake images and as a

365
00:14:27,119 --> 00:14:24,939
result they sort of explore the space of

366
00:14:30,030 --> 00:14:27,129
images to figure out what makes an image

367
00:14:32,610 --> 00:14:30,040
realistic and you can get very nice

368
00:14:34,920 --> 00:14:32,620
results even with a very small amount of

369
00:14:37,319 --> 00:14:34,930
data for instance this these are all

370
00:14:39,689 --> 00:14:37,329
fake all of these butterflies are

371
00:14:42,780 --> 00:14:39,699
generated by the generative part of the

372
00:14:44,610 --> 00:14:42,790
network from a noise pattern however the

373
00:14:47,579 --> 00:14:44,620
generator and discriminator were trained

374
00:14:50,519 --> 00:14:47,589
on a data set of I think it's three

375
00:14:53,220 --> 00:14:50,529
thousand three thousand butterfly images

376
00:14:56,220 --> 00:14:53,230
that look very similar to this but are

377
00:14:58,470 --> 00:14:56,230
different in detail as a result of this

378
00:15:01,889 --> 00:14:58,480
the model that the generator learns

379
00:15:03,900 --> 00:15:01,899
allows us to even generate intervening

380
00:15:06,389 --> 00:15:03,910
butterflies we can generate a butterfly

381
00:15:08,309 --> 00:15:06,399
halfway between two species or halfway

382
00:15:10,379 --> 00:15:08,319
between two genders about a fly halfway

383
00:15:12,329 --> 00:15:10,389
between front and back very strange

384
00:15:14,400 --> 00:15:12,339
things like that that don't exist in the

385
00:15:16,650 --> 00:15:14,410
world but that the understanding that

386
00:15:21,509 --> 00:15:16,660
the machine has developed allows it to

387
00:15:23,040 --> 00:15:21,519
imagine another interesting thing about

388
00:15:25,470 --> 00:15:23,050
this is once you have these creative

389
00:15:28,079 --> 00:15:25,480
machines well what's the relationship

390
00:15:30,239 --> 00:15:28,089
between that and human artists so you

391
00:15:32,189 --> 00:15:30,249
could just try to let the machine do

392
00:15:35,879 --> 00:15:32,199
everything but a more sort of

393
00:15:38,189 --> 00:15:35,889
interesting or a useful way to go about

394
00:15:40,829 --> 00:15:38,199
this is to ask what would a human artist

395
00:15:42,269 --> 00:15:40,839
do when given access to the machine the

396
00:15:45,100 --> 00:15:42,279
nice thing about this kind of technique

397
00:15:48,110 --> 00:15:45,110
is you can give it cute

398
00:15:49,910 --> 00:15:48,120
you can say I want you to draw something

399
00:15:52,189 --> 00:15:49,920
that fits within a certain outline so

400
00:15:55,309 --> 00:15:52,199
this is something called pics - pics and

401
00:15:57,139 --> 00:15:55,319
I drew this very bad cat and it

402
00:16:00,019 --> 00:15:57,149
generated this it filled in the details

403
00:16:02,420 --> 00:16:00,029
or if I give it something that's not a

404
00:16:04,069 --> 00:16:02,430
cat it still tries to make it into

405
00:16:06,079 --> 00:16:04,079
something like you cat it takes my cue

406
00:16:09,350 --> 00:16:06,089
and then it's just another tool I can

407
00:16:10,999 --> 00:16:09,360
use like a paintbrush this is an example

408
00:16:13,819 --> 00:16:11,009
of a piece of software you can download

409
00:16:16,660 --> 00:16:13,829
this it involves installing quite a lot

410
00:16:18,829 --> 00:16:16,670
of stuff unfortunately but it

411
00:16:20,780 --> 00:16:18,839
understands mountains at least to some

412
00:16:23,930 --> 00:16:20,790
degree and it can take cues of the color

413
00:16:26,300 --> 00:16:23,940
of the sky or the grass things like that

414
00:16:30,019 --> 00:16:26,310
and generate a mountain responding to

415
00:16:32,180 --> 00:16:30,029
your to your input all of this stuff is

416
00:16:34,759 --> 00:16:32,190
kind of data processing still even the

417
00:16:36,050 --> 00:16:34,769
generative stuff another place that

418
00:16:38,600 --> 00:16:36,060
people try to use artificial

419
00:16:40,819 --> 00:16:38,610
intelligence is to drive behaviors so a

420
00:16:43,699 --> 00:16:40,829
big example of this is self-driving cars

421
00:16:45,740 --> 00:16:43,709
you want the machine to control the

422
00:16:47,420 --> 00:16:45,750
motion of the vehicle it has to choose

423
00:16:49,850 --> 00:16:47,430
where to go it has to choose how to

424
00:16:51,800 --> 00:16:49,860
respond to a situation it has to

425
00:16:55,670 --> 00:16:51,810
generate actions not just correctly

426
00:16:58,370 --> 00:16:55,680
classify a perceptual input and this

427
00:17:00,439 --> 00:16:58,380
creates a really big problem when I want

428
00:17:02,120 --> 00:17:00,449
to classify perceptual input I can be

429
00:17:04,100 --> 00:17:02,130
like Google and collect 300 million

430
00:17:07,039 --> 00:17:04,110
images independent of any artificial

431
00:17:08,899 --> 00:17:07,049
intelligence development I can just

432
00:17:11,449 --> 00:17:08,909
gather as much data as I want and I'll

433
00:17:13,039 --> 00:17:11,459
get a very good AI out of that if I want

434
00:17:16,880 --> 00:17:13,049
something that's actually taking action

435
00:17:19,039 --> 00:17:16,890
it needs to know what its actions would

436
00:17:21,500 --> 00:17:19,049
do so if I want to understand how to

437
00:17:23,990 --> 00:17:21,510
make a safe self-driving car that

438
00:17:26,120 --> 00:17:24,000
doesn't or that can say recover from a

439
00:17:28,100 --> 00:17:26,130
near crash situation like if it starts

440
00:17:29,899 --> 00:17:28,110
to swerve off the road I want it to be

441
00:17:32,200 --> 00:17:29,909
able to recover I need to collect data

442
00:17:33,860 --> 00:17:32,210
of that car swerving off the road

443
00:17:36,350 --> 00:17:33,870
because these things can only

444
00:17:37,789 --> 00:17:36,360
interpolate so they need to have some

445
00:17:39,890 --> 00:17:37,799
experience of the situation's they're

446
00:17:42,470 --> 00:17:39,900
like please deal with or that you want

447
00:17:45,020 --> 00:17:42,480
them to behave competently in and that

448
00:17:46,940 --> 00:17:45,030
means for this kind of AI right now it's

449
00:17:49,970 --> 00:17:46,950
very far behind the sort of perceptual

450
00:17:52,010 --> 00:17:49,980
AI because of this data limit because

451
00:17:55,520 --> 00:17:52,020
you actually need to engage the AI in

452
00:17:57,080 --> 00:17:55,530
those real world control situations in

453
00:17:58,269 --> 00:17:57,090
order for it to actually have what it

454
00:18:01,879 --> 00:17:58,279
needs to

455
00:18:03,590 --> 00:18:01,889
alright so in conclusion the the main

456
00:18:05,299 --> 00:18:03,600
thing about modern AI compared to the

457
00:18:07,159 --> 00:18:05,309
sort of science-fiction images it's all

458
00:18:10,099 --> 00:18:07,169
experience driven it's all very

459
00:18:12,169 --> 00:18:10,109
intuitive it's essentially learning from

460
00:18:14,869 --> 00:18:12,179
doing a massive statistical analysis of

461
00:18:18,229 --> 00:18:14,879
lifetime or many human lifetimes worth

462
00:18:20,479 --> 00:18:18,239
of data that it's given but it doesn't

463
00:18:22,789 --> 00:18:20,489
actually work by deduction it doesn't

464
00:18:25,310 --> 00:18:22,799
work by following into some kind of

465
00:18:27,979 --> 00:18:25,320
extrapolate Ori story or what we would

466
00:18:30,589 --> 00:18:27,989
generally call understanding the nice

467
00:18:32,659 --> 00:18:30,599
thing about this setup is that it means

468
00:18:34,190 --> 00:18:32,669
you don't actually have to understand

469
00:18:35,330 --> 00:18:34,200
how to do the task you want to teach it

470
00:18:38,149 --> 00:18:35,340
you just have to have a lot of examples

471
00:18:40,460 --> 00:18:38,159
and there's a lot of flexibility in that

472
00:18:42,560 --> 00:18:40,470
you can set up the network with very

473
00:18:45,859 --> 00:18:42,570
complicated structures and let it take

474
00:18:48,289 --> 00:18:45,869
care of itself given the data and given

475
00:18:49,940 --> 00:18:48,299
a network the optimization process of

476
00:18:51,979 --> 00:18:49,950
fixing all the small errors bit by bit

477
00:18:53,419 --> 00:18:51,989
will take it to some kind of functional

478
00:18:55,099 --> 00:18:53,429
state and you don't actually have to

479
00:18:56,810 --> 00:18:55,109
understand how it's doing it at the end

480
00:18:58,700 --> 00:18:56,820
you just have to set up a circumstance

481
00:19:00,279 --> 00:18:58,710
where if it works you know it and you

482
00:19:03,320 --> 00:19:00,289
get what you want

483
00:19:05,810 --> 00:19:03,330
however because it can't extrapolate

484
00:19:07,969 --> 00:19:05,820
this puts a really strong limit on where

485
00:19:10,249 --> 00:19:07,979
we can easily use it versus where we

486
00:19:11,989 --> 00:19:10,259
might like to use it but it's not really

487
00:19:14,659 --> 00:19:11,999
going to give us what we want if we want

488
00:19:16,820 --> 00:19:14,669
to do things in very new situations

489
00:19:18,259 --> 00:19:16,830
that's a really big problem right now

490
00:19:20,359 --> 00:19:18,269
and learning how to make an actual

491
00:19:22,249 --> 00:19:20,369
functional extrapolate of AI I think

492
00:19:25,039 --> 00:19:22,259
right now is is one of the key problems

493
00:19:26,479 --> 00:19:25,049
and the data requirements pose certain

494
00:19:28,879 --> 00:19:26,489
limits on where and when you can

495
00:19:33,940 --> 00:19:28,889
actually use these technologies so thank